AITopics | approach 1

Collaborating Authors

approach 1

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dimension-adapted Momentum Outscales SGD

Neural Information Processing SystemsJun-21-2026, 03:37:01 GMT

We investigate scaling laws for stochastic momentum algorithms with small batch on the power law random features model, parameterized by data complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target complexities. While traditional stochastic gradient descent with momentum (SGD-M) yields identical scaling law exponents to SGD, dimension-adapted Nesterov acceleration (DANA) improves these exponents by scaling momentum hyperparameters based on model size and data complexity. This outscaling phenomenon, which also improves compute-optimal scaling behavior, is achieved by DANA across a broad range of data and target complexities, while traditional methods fall short. Extensive experiments on high-dimensional synthetic quadratics validate our theoretical predictions and large-scale text experiments with LSTMs show DANA's improved loss exponents over SGD hold in a practical setting.

large language model, machine learning, neural information processing system, (19 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Energy (0.45)
Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)

Add feedback

Assessing model error in counterfactual worlds

Howerton, Emily, Lessler, Justin

arXiv.org Artificial IntelligenceDec-2-2025

Counterfactual scenario modeling exercises that ask "what would happen if?" are one of the most common ways we plan for the future. Despite their ubiquity in planning and decision making, scenario projections are rarely evaluated retrospectively. Differences between projections and observations come from two sources: scenario deviation and model miscalibration. We argue the latter is most important for assessing the value of models in decision making, but requires estimating model error in counterfactual worlds. Here we present and contrast three approaches for estimating this error, and demonstrate the benefits and limitations of each in a simulation experiment. We provide recommendations for the estimation of counterfactual error and discuss the components of scenario design that are required to make scenario projections evaluable.

artificial intelligence, modeling & simulation, scenario, (16 more...)

arXiv.org Artificial Intelligence

2512.00836

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Enhanced Sentiment Analysis of Iranian Restaurant Reviews Utilizing Sentiment Intensity Analyzer & Fuzzy Logic

Rokhva, Shayan, Teimourpour, Babak, Babaei, Romina

arXiv.org Artificial IntelligenceMar-15-2025

This research presents an advanced sentiment analysis framework studied on Iranian restaurant reviews, combining fuzzy logic with conventional sentiment analysis techniques to assess both sentiment polarity and intensity. A dataset of 1266 reviews, alongside corresponding star ratings, was compiled and preprocessed for analysis. Initial sentiment analysis was conducted using the Sentiment Intensity Analyzer (VADER), a rule-based tool that assigns sentiment scores across positive, negative, and neutral categories. However, a noticeable bias toward neutrality often led to an inaccurate representation of sentiment intensity. To mitigate this issue, based on a fuzzy perspective, two refinement techniques were introduced, applying square-root and fourth-root transformations to amplify positive and negative sentiment scores while maintaining neutrality. This led to three distinct methodologies: Approach 1, utilizing unaltered VADER scores; Approach 2, modifying sentiment values using the square root; and Approach 3, applying the fourth root for further refinement. A Fuzzy Inference System incorporating comprehensive fuzzy rules was then developed to process these refined scores and generate a single, continuous sentiment value for each review based on each approach. Comparative analysis, including human supervision and alignment with customer star ratings, revealed that the refined approaches significantly improved sentiment analysis by reducing neutrality bias and better capturing sentiment intensity. Despite these advancements, minor over-amplification and persistent neutrality in domain-specific cases were identified, leading us to propose several future studies to tackle these occasional barriers. The study's methodology and outcomes offer valuable insights for businesses seeking a more precise understanding of consumer sentiment, enhancing sentiment analysis across various industries.

artificial intelligence, natural language, sentiment analysis, (16 more...)

arXiv.org Artificial Intelligence

2503.12141

Country:

Europe > Hungary > Budapest > Budapest (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)

Genre:

Research Report > New Finding (1.00)
Overview (0.93)

Industry: Consumer Products & Services > Restaurants (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Gemstones: A Model Suite for Multi-Faceted Scaling Laws

McLeish, Sean, Kirchenbauer, John, Miller, David Yu, Singh, Siddharth, Bhatele, Abhinav, Goldblum, Micah, Panda, Ashwinee, Goldstein, Tom

arXiv.org Artificial IntelligenceFeb-7-2025

Our models, called the Gemstones Scaling laws are typically fit using a family of because they are loosely based on scaled-down variants models with a narrow range of frozen hyperparameter of the Gemma architecture, vary in their parameter count, choices. In this work we study scaling width/depth ratio, training tokens, learning rates, and laws using a wide range of architecture and hyperparameter cooldown schedules. By fitting scaling laws to these choices, and highlight their impact on checkpoints, we confirm that scaling law parameters and resulting prescriptions. As a primary artifact of interpretations indeed depend strongly on the selection of our research, we release the Gemstones: the most models and fitting procedure used, and we quantify the comprehensive open-source scaling law dataset degree to which these decisions impact predictions. By to date, consisting of over 4000 checkpoints from exploiting the variation among our model checkpoints, we transformers with up to 2 billion parameters; these also fit a number of unique scaling laws and analyze their models have been trained with different learning predictions to discern whether they are consistent with rates, cooldown schedules, and architectural design choices we see in industry models.

gemstone, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.06857

Country:

North America > United States > Maryland (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Dynamic Multimodal Sentiment Analysis: Leveraging Cross-Modal Attention for Enabled Classification

Lee, Hui, Suniljit, Singh, Ong, Yong Siang

arXiv.org Artificial IntelligenceJan-14-2025

This paper explores the development of a multimodal sentiment analysis model that integrates text, audio, and visual data to enhance sentiment classification. The goal is to improve emotion detection by capturing the complex interactions between these modalities, thereby enabling more accurate and nuanced sentiment interpretation. The study evaluates three feature fusion strategies -- late stage fusion, early stage fusion, and multi-headed attention -- within a transformer-based architecture. Experiments were conducted using the CMU-MOSEI dataset, which includes synchronized text, audio, and visual inputs labeled with sentiment scores. Results show that early stage fusion significantly outperforms late stage fusion, achieving an accuracy of 71.87\%, while the multi-headed attention approach offers marginal improvement, reaching 72.39\%. The findings suggest that integrating modalities early in the process enhances sentiment classification, while attention mechanisms may have limited impact within the current framework. Future work will focus on refining feature fusion techniques, incorporating temporal data, and exploring dynamic feature weighting to further improve model performance.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.08085

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Lexicographic optimization-based approaches to learning a representative model for multi-criteria sorting with non-monotonic criteria

Zhang, Zhen, Li, Zhuolin, Yu, Wenyu

arXiv.org Artificial IntelligenceSep-3-2024

Deriving a representative model using value function-based methods from the perspective of preference disaggregation has emerged as a prominent and growing topic in multi-criteria sorting (MCS) problems. A noteworthy observation is that many existing approaches to learning a representative model for MCS problems traditionally assume the monotonicity of criteria, which may not always align with the complexities found in real-world MCS scenarios. Consequently, this paper proposes some approaches to learning a representative model for MCS problems with non-monotonic criteria through the integration of the threshold-based value-driven sorting procedure. To do so, we first define some transformation functions to map the marginal values and category thresholds into a UTA-like functional space. Subsequently, we construct constraint sets to model non-monotonic criteria in MCS problems and develop optimization models to check and rectify the inconsistency of the decision maker's assignment example preference information. By simultaneously considering the complexity and discriminative power of the models, two distinct lexicographic optimization-based approaches are developed to derive a representative model for MCS problems with non-monotonic criteria. Eventually, we offer an illustrative example and conduct comprehensive simulation experiments to elaborate the feasibility and validity of the proposed approaches.

approach 1, criteria, marginal value function, (11 more...)

arXiv.org Artificial Intelligence

2409.01612

Country:

Asia > China > Liaoning Province > Dalian (0.04)
South America > Brazil (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Add feedback

Chinchilla Scaling: A replication attempt

Besiroglu, Tamay, Erdil, Ege, Barnett, Matthew, You, Josh

arXiv.org Artificial IntelligenceMay-14-2024

Hoffmann et al. (2022) propose three methods for estimating a compute-optimal scaling law. We attempt to replicate their third estimation procedure, which involves fitting a parametric loss function to a reconstruction of data from their plots. We find that the reported estimates are inconsistent with their first two estimation methods, fail at fitting the extracted data, and report implausibly narrow confidence intervals--intervals this narrow would require over 600,000 experiments, while they likely only ran fewer than 500. In contrast, our rederivation of the scaling law using the third approach yields results that are compatible with the findings from the first two estimation procedures described by Hoffmann et al.

confidence interval, loss value, training token, (15 more...)

arXiv.org Artificial Intelligence

2404.10102

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Add feedback

Software Mention Recognition with a Three-Stage Framework Based on BERTology Models at SOMD 2024

Thi, Thuy Nguyen, Viet, Anh Nguyen, Van, Thin Dang, Thuy, Ngan Nguyen Luu

arXiv.org Artificial IntelligenceApr-23-2024

This paper describes our systems for the sub-task I in the Software Mention Detection in Scholarly Publications shared-task. We propose three approaches leveraging different pre-trained language models (BERT, SciBERT, and XLM-R) to tackle this challenge. Our bestperforming system addresses the named entity recognition (NER) problem through a three-stage framework. (1) Entity Sentence Classification - classifies sentences containing potential software mentions; (2) Entity Extraction - detects mentions within classified sentences; (3) Entity Type Classification - categorizes detected mentions into specific software types. Experiments on the official dataset demonstrate that our three-stage framework achieves competitive performance, surpassing both other participating teams and our alternative approaches. As a result, our framework based on the XLM-R-based model achieves a weighted F1-score of 67.80%, delivering our team the 3rd rank in Sub-task I for the Software Mention Recognition task.

language model, recognition, three-stage framework, (14 more...)

arXiv.org Artificial Intelligence

2405.01575

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(4 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

Universal Auto-encoder Framework for MIMO CSI Feedback

So, Jinhyun, Kwon, Hyukjoon

arXiv.org Artificial IntelligenceMar-1-2024

Existing auto-encoder (AE)-based channel state information (CSI) frameworks have focused on a specific configuration of user equipment (UE) and base station (BS), and thus the input and output sizes of the AE are fixed. However, in the real-world scenario, the input and output sizes may vary depending on the number of antennas of the BS and UE and the allocated resource block in the frequency dimension. A naive approach to support the different input and output sizes is to use multiple AE models, which is impractical for the UE due to the limited HW resources. In this paper, we propose a universal AE framework that can support different input sizes and multiple compression ratios. The proposed AE framework significantly reduces the HW complexity while providing comparable performance in terms of compression ratio-distortion trade-off compared to the naive and state-of-the-art approaches.

compression ratio, dimension, encoder, (15 more...)

arXiv.org Artificial Intelligence

2403.00299

Country: North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.70)

Industry: Telecommunications (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Text-To-KG Alignment: Comparing Current Methods on Classification Tasks

Wold, Sondre, Øvrelid, Lilja, Velldal, Erik

arXiv.org Artificial IntelligenceJun-5-2023

In contrast to large text corpora, knowledge graphs (KG) provide dense and structured representations of factual information. This makes them attractive for systems that supplement or ground the knowledge found in pre-trained language models with an external knowledge source. This has especially been the case for classification tasks, where recent work has focused on creating pipeline models that retrieve information from KGs like ConceptNet as additional context. Many of these models consist of multiple components, and although they differ in the number and nature of these parts, they all have in common that for some given text query, they attempt to identify and retrieve a relevant subgraph from the KG. Due to the noise and idiosyncrasies often found in KGs, it is not known how current methods compare to a scenario where the aligned subgraph is completely relevant to the query. In this work, we try to bridge this knowledge gap by reviewing current approaches to text-to-KG alignment and evaluating them on two datasets where manually created graphs are available, providing insights into the effectiveness of current methods.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2306.02871

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Hong Kong (0.05)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(7 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback